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Shannon's classical information theory [18] uses probability theory to analyze channels as mech- 
anisms for information flow. In this paper, we generalize results from |[T4ll for binary channels to 
show how some more modern tools — probabilistic monads and domain theory in particular — can 
be used to model classical channels. As initiated in lfl4l . the point of departure is to consider the 
family of channels with fixed inputs and outputs, rather than trying to analyze channels one at a time. 
The results show that domain theory has a role to play in the capacity of channels; in particular, the 
n x n-stochastic matrices, which are the classical channels having the same sized input as output, 
admit a quotient compact ordered space which is a domain, and the capacity map factors through this 
quotient via a Scott-continuous map that measures the quotient domain. We also comment on how 
some of our results relate to recent discoveries about quantum channels and free affine monoids. 

1 Introduction 

Classical information theory has its foundations in the seminal work of Claude Shannon iTHIl . who first 
conceived of analyzing the behavior of channels using entropy and deriving a formula for channel ca- 
pacity based on mutual information (cf. @ for a modern presentation of the basic results). Recent work 
of Martin, et al. |[T4l reveals that the theory of compact, affine monoids and domain theory can be used 
to analyze the family of binary channels. In this paper, our goal is to generalize the results in [14] to the 
case of n x ^-channels — channels that have n input ports and n output ports. Our approach also uses 
the monadic properties of probability distributions to give an abstract presentation of how channels arise, 
and that clarifies the role of the doubly stochastic matrices, which are special channels. While our work 
focuses on the classical case, the situation around quantum information and quantum channels is also a 
concern, and we point out how our results relate to some recent work (7J[l5l on quantum qubit channels 
and free affine monoids. While most of the ingredients we piece together are not new, we believe the 
approach we present does represent a new way in which to understand families of channels and some of 
their important features. 

The rest of the paper is structured as follows. In the next section, we describe three monads based 
on the probability measures over compact spaces, compact monoids and compact groups. Each of these 
is used to present some aspect of the classical channels. We then introduce topology, and show how 
the capacity of a channel can be viewed from a topological perspective. The main result here is that 
capacity is the maximum distance from the surface determined by the entropy function and the under- 
lying polytope generated by the rows of a channel matrix, viewed as vectors in W for appropriate n. 
This leads to a generalization of Jensen's Lemma that characterizes strictly concave functions. Domain 
theory is then introduced, as applied to the finitely-generated polytopes residing in a compact convex set, 
ordered by reverse inclusion. Here we characterize when proper maps measure a domain, in the sense of 
Martin liT3l : a closely related result can be found in |[T6l . Finally, we return to the compact monoid of 
n x n-stochastic matrices and show that it has a natural, algebraically-defined pre-order relative to which 
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capacity measures the quotient partial order, which is a compact ordered space. The capacity mapping 
is also shown to be strictly monotone with respect to this pre-order, which means that strictly smaller 
channel matrices have strictly smaller capacity. We close with a summary and comments about future 
work. 

2 Three probabilistic monads 

The categorical presentation of classical information relies on three monads, each of which has the 
family Prob(X) of probability distributions over a set X as the object-level of the left adjoint. The first of 
these starts with compact Hausdorff spaces, and uses several results from functional analysis: standard 
references for this material are HUT]]. We present these monads in turn: 

2.1 A spatial monad 

We begin with the probability measure monad over topological spaces. If X is a compact Hausdorff 
space, then C(X,R), the family of continuous, real-valued functions defined on X, is a Banach space 
(complete, normed linear space) in the sup-norm. The Banach space dual of C(X,R), denoted C(X,R)* 
consists of all continuous linear functionals from C(X,R) into R. C(X,R)* is another Banach space, 
and the Riesz Representation Theorem implies this is the Banach space of Radon measures on X (those 
that are both inner- and outer regular). The unit sphere of C(X,R)* is the family Prob(X) of proba- 
bility measures over X. If we endow Prob(X) with the weak* topology (the weakest topology mak- 
ing all continuous linear functionals into R continuous), then Prob(X) becomes a compact, Hausdorff 
space, by the Banach- Alaoglu Theorem. Prob extends to a functor Probs : Comp — > CompConv LC from 
the category of compact Hausdorff spaces and continuous maps, to the category of compact, convex, 
locally convex spaces and continuous affine maps, via Probs(X) = Prob(X) and /: X — > Y maps to 
Prob^/) : Prob 5 (X) -4 Prob s (F) by Prob s (/)(/i)(A) = ii(f- l {A)), for each Borel set A C Y. 

Moreover, if the mapping jc i — >- 8 X : X — > C(X,R)* sending a point to the Dirac measure it defines, 
is a continuous mapping into the weak* topology. Since X is compact Hausdorff, Urysohn's Lemma 
implies C(X,R) separates the points of X, and so x h-> 8 X is a homeomorphism onto its image. Another 
application of Urysohn's Lemma shows each Dirac measure is an extreme point Prob(X) and in fact the 
Dirac measures form the set of extreme points of Prob(X). 

A simple measure is a finite, convex combination of Dirac measures, i.e., one of the form £,•<„ r ; -5 A; , 
where r, > 0, Y,i r i = U an d x\ £ X for each i. We let Prob„„,(X) denote this family. The Krein-Milman 
Theorem implies that Prob i;m (X) is weak* dense among the probability measures. So, if /: X — > C is 
a continuous function from X into a compact subset of a locally convex vector space, then the function 
f(8 x ) = f(x) extends uniquely to continuous function /(E;<« r A,) = Y,i<n r if{ x i)> an ^ tnen to an °f 
Prob(X), by the density of the simple measures. Obviously, f(8 x ) = f(x). 

We conclude that the functor Probs is left adjoint to the forgetful functor. In fact, Probs de- 
fines a monad, where the unit of the adjunction is the mapping T]x(x) = 8 X and the multiplication 
pt : Prob(Prob(X)) -4 Prob(X) is integration. 

Theorem 1. The functor Probs sending a compact space to its family of probability measure in the weak* 
topology defines a monad on the category Comp. The unit of the monad sends a point x G X to the Dirac 
measure 8 X , and the image of the unit is the set of extreme points in Prob{X). 

Definition 1. Let X and Y be compact Hausdorff spaces. A (lossless) noisy channel from X to Y is a 
mapping f: X — > Prob(Y). 
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Since Prob^ is a monad, each channel f:X—t Prob(F) corresponds uniquely to a continuous, affine 
mapping Probs(/) : Prob(Z) — >■ Prob(F) in the Kleisli category J^p ro b s of Prob^. 

Example 1. Let n>\ and let n = {0, ...,n— 1} be the discrete, compact space. Then Probin) is 
the family of probability distributions on n points, and given m>\, a channel f:m—> Prob(n) is an 
m x n-stochastic matrix. The family ST(m,n) of m x n-stochastic matrices is then the family of lossless, 
noisy channels from m to n. Moreover, from our comment about the Kleisli category J^p ro b s , we con- 
clude that the family of morphisms J^p ro b s (m,Prob(n)) is ST(m,n) Aff(Prob(m), Prob(n)), where 
AfF(Prob(m), Prob(n)) is the family of continuous affine maps from Prob(m) to Prob(n). 

This first probabilistic monad shows that classical channels correspond to mappings in the Kleisli 

def 

category of the "spatial" monad Probs on the category Comp. If we let m = n, then ST(«) = ST(n,n) 
is also a monoid using composition in the Kleisli category: if f,g € ST(«), and g: Prob(ra) — > Prob(ra) 
is the extension of g, then g°f::=g°f£ SJ(n). We next present a second monad that gives another 
account of this special case. 



2.2 A monad on monoids 

The second monad we define is based on the category CM on of compact monoids and compact monoid 
homomorphisms. More precisely, a compact monoid is a monoid S — a non-empty set endowed with 
an associative binary operation (x,y) <-^-xy: S x S — > S that also has an identity element, I5 — that also 
is a compact Hausdorff space for which the multiplication is continuous. We can apply the probability 
functor to such an S to obtain the compact convex (Hausdorff) space Prob(5) of probability measures on 
S. If we denote multiplication on 5 by •, then Prob(-) : Prob(5 x 5) — > Prob(5), and since 15 : Prob(S) x 
Prob(S') ^ Prob(5' x 5) is an embedding, we have a continuous affine map Prob(-) o i s : Prob(S) x 
Prob(»S) — > Prob(»S). This map is called convolution, and we denote (Prob(-) oi s )(;U,v) = jJ, * s v. It is 
routine to show convolution is associative, so Prob(5) is a compact affine monoid. 

If <p : S — > T is a morphism of compact semigroups, then Prob(0) : Prob(S') — > Prob(r) is defined 
by Prob(0)(/i)(/) = ffo^d^ for any/: T -> R. If ju, v G Prob(S) and / € C(T,M.), then 

Prob(jU * s v)(/) = (fo<j))d(n*sV) = / / fo om s d\idv 
Js JsJs 

= J J fom T o (0 x ty)dlldv 

= (Prob(0)( J u)* r Prob(0)(v))(/), 

where m^: S x S — » S, mj : T x T — > T are the semigroup operations, * 5 , * T denote convolution, and 
where = follows from the fact that is a homomorphism. Thus Prob^: (Prob(5),* 5 ) — v (Prob(r),*y) 
is a semigroup homomorphism. Finally, the fact that Prob(0) preserves the identity follows from the 
observation that 8 X * s 8 y = 5^., which implies that 81 is an identity for the simple measures, and conse- 
quently for all measures since the simple measures are dense. 

It follows that restricting Probj to the subcategory CMon of Comp yields a functor Prob^ : CMon — > 
CAM into the category of locally convex compact affine monoids and continuous affine monoid maps. 

Theorem 2. The restriction of P robs to Mon induces a monad ProbM whose target is CAM, the category 
of locally convex compact affine monoids and continuous, affine monoid homomorphisms. The unit of 
the monad is again the Dirac map, and its image is again the set of extreme points of Prob(S). 
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Applying the same reasoning as for Probs, we see that each continuous monoid homomorphism 
<p : S — > Prob(r) corresponds to a unique morphism of compact affme monoids, : Prob(5) — > Prob(r) 
in the Kleisli category J£p ro t> M - However, in relation to classical channels, our interest is in the object 
level of Pro b a/: 

Example 2. We return to the example ST{n) of stochastic n x n-matrices. These arise as channels on a 
discrete, n-element set. For such a set n, the selfmaps of n form a finite — hence compact — monoid. If we 
denote this monoid by [« —■ n], then applying Probu we obtain a compact affine monoid ProbM(\n — > n]). 

Now, \n — > n] <^-» [« — > Prob(n)} by f h-> rj-nO f, where t]n is the unit for Probs- Since ST(n) = [n — > 
P robin)} is a compact affine monoid, this mapping extends to a morphism of compact affine monoids 

Y,nS fi ^ XV^o/- Prob M ([n,n}) -> ST(n). 

i<k i<k 

Since {%°/ | / € [«—>«]} is the set of extreme points of ST(n), this morphism is surjective. In fact this 
map is an isomorphism. Thus ST(n) is the free compact affine monoid over \n —¥ n]. 

2.3 A monad over compact groups 

Our final use of Prob to define a monad starts with CGrp, the category of compact groups and continuous 
group homomorphisms. Since CGrp is a subcategory of CMon, we know that applying Prob^ to a 
compact group yields a compact affine monoid. However, Prob^(G) is not a group in general, so the 
forgetful functor from CMon does not take ProbM(G) to a group, but instead yields a compact monoid. 

But, when applied to a compact group G qua compact monoid, the unit of the monad Prob^ sends 
each g G G to 8 g € Prob(G), and this is a monoid — hence group — homomorphism. So, we define 
a new functor H: CMon — > CGrp by H(S) = H(ls), the group of unit^] of the compact monoid 5. If 
(f) : S — > T is a morphism of compact affine monoids, then <j>\H(i s ) '■ H(ls) ~^ H(It) is a morphism of 
compact groups, so H defines a functor. 

Theorem 3. The functor H: CAM — » CGrp is right adjoint to the functor Probe'- CGrp — > CAM. In 

fact, the composition H o Probe defines a monad on CGrp. Moreover, for any compact group G, we have 
H{ProbaG) ~ G. Again, the unit of the monad is the Dirac map, and the image ofG in Prob(}{G) is the 
set of extreme points. 

Example 3. We again consider the case of classical channels. Here, given n > 1, ST{n) has for its group 
of units the permutation group S{n). Applying Probe, we find that Probe(S{n)) is the free compact affine 
monoid over the group S(n). But this is just the family DT(n) of doubly stochastic n x n-matrices. 

We can use information about Probc(S(n)) to conclude information about DT(«). Wendel's Theo- 
rem ll20l states that, for a compact group G, the compact monoid Prob(G) has {8 g \ g G G} as its group 
of units, and the minimal ideal (every compact monoid has one — cf. [ 10]) is a zero, which in fact is 
Haar measure on G. In the case of S(n), this reaffirms that the units of DT(n) are the permutations of n, 
and that DT(n) has a zero, which is the equidistribution £ ;< „ -Si. 

Corollary 1. If G is a finite group, then Probc{G) is the free affine monoid over G, as well as the being 
the free compact affine monoid over G. 

'A unit of a monoid is an element that has a two-sided inverse with respect to the identity I5. The set of units H(ls) = {x e 
S I (3y G S) xy = yx = I5} forms the largest subgroup of 5 that has I5 as the identity; if S is compact, then so is the group of 
units. 



Mislove 



91 



Proof. If G is a finite group, then Prob(G) = r,-5 g( . | k G N, r,- G [0, 1] , r ( - = 1 A gj G G} consists of 
simple measures. If S is an affme monoid and : G — > 5 is a monoid homomorphism, then 0(G) C //(Is), 
and so : G — > H(ls) is a group homomorphism. Then 0(L,</t r ;5g,) = Y,i<k r i${gi) lS easily seen to be 
a morphism of affine monoids that satisfies 0(<5 ? ) = (j)(g) for each g € G, and is the unique such 
since Prob(j(G) consists of simple measures. This shows Prob(;(G) is the free affine monoid over G, 
and the Theorem implies it also is the free compact affine monoid over G since G is finite, and hence 
compact. □ 

Remark 1. In ^71 l/5|/ , the free affine monoid over a finite group is employed to deduce properties of quan- 
tum channels. The Corollary shows that the free affine monoid over G is nothing other than ProbciG), 
which implies it is compact, as well as telling us that it has a zero — the uniform distribution on G. 
We believe other useful properties about quantum channels over finite groups can be deduced from this 
observation. 



3 Capacity as a topological concept 

In this section we develop a new approach to understanding the capacity of a classical channel. Our idea 
is to analyze capacity from a topological perspective, rather from the usual perspective of inequalities 
prevalent in information theory. We begin with a brief reprise of the basics of Shannon information; the 
standard reference for this material is Q. 

If X : — > R is a random variable on a finite probability space (^,p), then the entropj^ of X 
is defined as H(X) = —T,xeS'P( x )^°&2P( x )- KY: & — > R is another finite random variable, then the 
conditional entropy of Y given X is 

H(Y\X) = £ p{x)H(Y\X =x) = £ p(x) £ p(y\ x )log 2 -L- (1) 

xesr xesr y e9 P\y\ x ) 

and the mutual information in X and Y is 

J^(Y,X)=H(Y)-H(Y\X) = H(X)-H(X\Y). 

If C: 3£ — > & is a channel from inputs 3£ to outputs 2^, then C is an X x <3f -matrix whose (^:,y)-entry 
is the conditional probability of output y occurring, given that the input was x. Each distribution p on the 
inputs 3£ then produces a corresponding distribution p ■ C on & . The capacity of a channel is given by 

Cap: [Prob(J0 -> Prob(^)] ->■ [0,1] by Cap(C) = sup H(p-C) -H(pC \ p), 

i.e., Cap(C) is the supremum of the possible mutual information values <f(p ■ C \ p) as p ranges over the 
distributions on JT, the set of inputs. 

If we let X = <3f = n and C : n — > Prob(«) is a channel, then 



Cap(C) = sup 



//(£ rj C(j\l), . . . , £ rf{j\n)) - £ ri H{C(i\ 1), . . . ,C(i\n)) 

j<n j<n i<n 



(2) 



This formula requires some interpretation. 

2 We use H(X) to denote the entropy of a random variable X; this overloads our notation for the maximal subgroups of a 
monoid S, but we believe the context will be sufficient to make the meaning clear. 
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1. First, the term to which H is first applied — (L/<« r jC(j, 1), • • • ,Hj<n r jC(j^ n )) — represents a 
distribution on <3f = n obtained from pC, where p = L, r,5, is a distribution on X = n. This 
is the /^-convex combination of the n vectors (C(l, 1), . .. ,C(l,n)),. . . , (C(n, 1), . . . ,C(n,n)) £ 
[0,1]" comprising the rows of the channel C, where C(i,j) denotes the /,j-entry of C, inter- 
preted as a conditional probability. Since C is a channel, each of these rows is a probabil- 
ity distribution on n. (As a sanity check, we see that applying H to the convex combination 
(£<j<n r jC{ji !)>• ■ • iHj<n r jC{j,n)) thus makes sense, since a convex combination of probability 
distributions is another such, and H applies to probability distributions.) 

2. Now, the convex combination pC = (L/<« r jC{j, 1), • • • ^j<n r jC{j,n)) is a point on the polytope 
K C [0, 1]" the rows of C generate, so 

(^rjC(j^),...,^rjCU,n)),H(^rjC(j,l),...,^rjCU,n)))e[0,l} n xR 

j<n j<n j<n j<n 

represents the point on the surface H generates over the polytope K. 

3. Likewise the second term, £ ( - <?I r;//(C(/|l), . . . ,C(i\n)) of Equation [2] is a />convex combination, 
/? = Yti<n r i^h of the terms H(C(i, 1), . . . ,C(/,n)), each of which is obtained by applying // to a row 
of C, regarded as an elemnt of [0, 1]". We can regard each of the points H(C(i, 1), . . . ,C(i,n)) as 
being the n + 1-coordinate of a tuple (C(i, 1), . . . ,C(i,n),H(C(i, 1), . . . ,C(i,n)) £ W t+l , and hence 
the p-convex combination of these points lies on the polytope these points generate. 

4. Finally, the difference H(^ n rf{j\l),...^ n rf{i\n))-^ n r i H{^ is the 
difference in the n + 1 -coordinates described under 2. and 3., so it is height of the vertical line 
between the point pC in 3. and the corresponding point 

(C(l, 1), . . . .CCl,™),^^!, 1), . . . ,C(l,n)), . . . , (C(n, 1), . . . ,C(n,n),/r(C(n, 1), . . . ,C(n,n)))) 

on the surface H generates over K. 

Thus, Cap(C) as presented by Equation [2] takes the supremum of the differences between the value of H 
at a convex combination of the rows of C and the same convex combination of H applied to the rows of 
C. It is well-known that entropy H is a strictly concave function, and we now take advantage of this to 
formulate a result about Ca p. 

Definition 2. Let K CR" be a convex set. A function f: K — > R is strictly concave if 

f(rx + (l-rtf)>rf(2) + (l-r)f(y) 

for all r £ (0, 1) and all x , ;y £ K. 

We next recall Jensen 's Inequality: 

Theorem 4 (Jensen (cf. [9])). If f: K—} Mis a convex function defined on a convex subset K of a vector 
space V, then Ef(X) > f(E(X)) for a finite random variable X: & — > K, where E denotes expectation. 
Moreover, iff is strictly convex, then E(f(X)) = f{E{X)) implies X is constant. 

Jensen's Inequality is a fundamental result of information theory; for example, it is crucial for proving 
mutual information is non-negative, that the mutual information in a pair of random variables is iff 
the random variables are independent, and that entropy itself is strictly concave (cf. 0, Chapter 2). 
Since / is (strictly) concave iff — / is (strictly) convex, the following generalizes Jensen's Inequality by 
strengthening the result in case / is strictly convex. 
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Lemma 1. Iff: K — > R be defined from a convex subset K to R. Then the following are equivalent: 

1. f is strictly concave. 

2. For all r\ , . . . , r m G (0, 1 ) and all x i , . . . , ~x m G ^, 



I> = 1 / ( £ r f x ,■ J > £ r,-/( x . 

i<m \i<m J i<m 



Proof, (ii) (?) is obvious. For the reverse direction, we proceed by induction on m. The base case, 
m = 2, is just the definition of strict concavity. So suppose (ii) holds for some m, and consider a family 
H, . . . , r m+ i G [0, 1] and x i , . . . , x m+ i G Since r, G (0, 1) for each i, 

f\ £ r *'*M = /( H W+(r ffl + r m+ i)( — x m H — x m+1 ) 

\;<m+l / \Km-l r m + r m+l r m + r m+l J 

> £ rif(xi) + (r m + r m+ i)f{ — x m H — *m+l) 

Km— 1 r m~r r m+l r mi r m+\ 

> £ r,-/(x ; -)+r m /(x m ) + r m+ i/(x m+ i) 

Km— 1 

Km+ 1 



Notation: If ^ C R" be a compact convex set, then Con„(.fir) denotes the family of convex polytopes 
conv({xi , . . . ,Xk}) generated by finite subsets {xi , . . . ,xt} C ^T, where < n. 

def 

Also note that F„ = {i £ [0, 1]" | J^x, = 1} is a compact, convex subset of [0, 1]", which we identify 
with the family Prob(TT) of probability distributions on n. 

Proposition 1. Let K C R" be a compact, convex set, and let f: K > be continuous and strictly 
concave. Define 

f: Con n (K) -> R°p by /(convex! ,x k })) = sup / ( £ r ; x ; ] - £ r,/(x,). 

(ri,...,f i )€[0,l]* \Kfc / i<jfc 

r/ie« / jj continuous and monotone with respect to reverse inclusion. 

Proof. The compactness of ^ implies that the family Con n (K) is closed under filtered intersec- 
tions in the hyperspace of non-empty, closed subsets of K, and then the continuity of / follows 
from the continuity of /. This map is clearly monotone. To show it is strictly monotone, let 
cdw({xi,...,Xk}),com({yi,...,y m }) G Con n (K) withconv({xi,...,x^}) C conv({yi, . . . ,y m }). Since/ 
is continuous, /(conv({xi , . . . ,x^})) assumes its value at some point in conv({xi , . . . ,xt}), and since / is 
strictly concave, this value is not assumed atx, for any index i. Thus, there is a fc-tuple (n, . . . , r^) G (0, \) k 
with 

/(cohv({xi , . . . ,x k })) = f £ nxi - £ rif(xi). 

\i<k J i<k 
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Because conv({jci,.. . ,jc^}) C com({yi,...,y„}), for each i < k, there is (sn , . . . ,s im ) G [0,l] m with 
Xi = Y,j<m s ijy ';> an d at l east one °f ^ families (sjj)j< m G (0, l) m . For this index i, we have /(x,) = 
f(Lj<mS<jyj) > Lj<mS,,jf(yj)- LemmaOii) then implies that 



('<& i<k \j<m ) i<k j<m 



and so 



a- 



/(conv({x! , . . . ,**})) = /(£ Wi) - £ r ; /(x,) 

i<k i<k 

< f[L r i(L s ijyj])-ZL r i s ijf(yj) 

\<<k \j< m J J i<kj<m 

< sup / £s/.v/ Ls/./iv/) 

(S,,...,v m )e[0,l]<» \ y / ; 

= f(coW({y u ...,y m })). □ 



4 Domains 



In this section, we introduce domains, which are the next ingredient in our analysis of classical channels. 
For details about these structures, a standard reference is HI or QQ. A partial order is a non-empty set 
P endowed with a reflexive, antisymmetric and transitive relation. A subset DCPis directed is every 
finite subset of D has an upper bound in D; P is directed complete if every directed subset of P has a least 
upper bound in P. We denote directed complete partial orders as dcpos. 

If P and Q are dcpos, then f:P^-Qis Scott continuous if / is monotone and preserves suprema 
of directed sets. An equivalent definition is available using topology: a subset U C P is Scott open if 
U = \U = {x G P | (3u G U) u < x} is an upper set, and for any directed subset D C P, if supD G £/, 
then DPiU ^ 0. The Scott-open sets form a topology on P, called the Scott topology, and the functions 
/ : P — > Q that are continuous with respect to this topology are exactly those that are Scott continuous, 
as defined above. 

Example 4. Let K be a compact convex subset of a topological vector space, and let Con{K) denote the 
compact convex subsets of K. We can order these by reverse inclusion: C QC' <£4> C' Q C. A directed 
family D C Con{K) is simply a filterbasis, and since K is compact and each set in D is convex, the set 
f]D G Con(K). Thus Con(K) is a dcpo. 

We can say more. IfC,C G Con{K) and C C C°, the interior of C, then given any directed set D 
with p|D CC', there is some E G D with E C C°, and hence £CC. In this case we say C is way-below 
C', and we write C <C C. In fact, if the ambient topological vector space is locally convex, then each 
C G Con(K) is the filtered intersection of those C satisfying C <^ C : this follows from the fact that in any 
compact Hausdorff space, each compact subset is the filtered intersection of its compact neighborhoods, 
and the same applies to compact, convex sets in a locally convex topological vector space. 

A domain is a dcpo satisfying {y G P | y <C x} is directed and x = sup{y G P | y <S x} for each 
x G P. The original motivation for domains was to provide semantic models for high-level programming 
languages, where the fact that any Scott-continuous selfmap on a domain with least element has a least 
fixed point gives a canonical model for recursion. 
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Motivated by examples of selfmaps that are not Scott continuous, Martin |[T3l devised another ap- 
proach to guaranteeing fixed points, using the concept of a measurement: A Scott-continuous function 
m : P — >■ [0, °°) op is said to measure the content at x G P if, given U CP Scott open, 



where m e (x) = {y < x \ m(y) — mix) < e}. We say that m measures P if m measures the content at x for 
each x £ P. 

For our next result, we need some notions from topology. Recall that s subset A C X of a topological 
space is saturated if A = I A C £/ open}. The saturation of a subset A is f){U \AQU open}, so a 
set is saturated iff it is equal to its saturation. Moreover, a subset is compact iff its saturation is compact. 
Definition 3. A continuous function f: X —>Y between topological spaces is proper iff (K) is compact 
for each saturated, compact subset K C7. 

A Scott-continuous map m : P — > Q between continuous posets is proper iff m _1 is Scott compact 
for each y G Q. We say that a Scott-continuous mapping m: P — > Q between posets is proper at x G P 
if \.xf\mT (ty) is Scott compact in \jc for each y G Q. We use [0,°°) op to denote the non-negative real 
numbers in the dual order; the following result is from ifTUl : 

Proposition 2. Let P be a domain and let m: P — > [0,°°) op be Scott continuous. If m is proper at x G P, 
then the following are equivalent: 

1. m measures the content at x. 

2. m is strictly monotone at x G P: i.e., y < x & m(y) = m(x) => y = x. 

In particular, a Scott-continuous, proper map m: P ^ [0,°°) op measures P iff m is strictly monotone at 
each x G P. 

Corollary 2. Let P n = {x G [0, 1]" | ^,oc; = 1} be the compact, convex set of distributions on n. Then 
(Con n (P„), D) is a domain, and the mapping cap: Con n {P n ) — >■ [0,°°) op by 



measures ( Con„ (P„) , D )• 

Proof. The discussion in Example 0] applies to = P„ to show that Con(P„) is a domain, and Con„(P„) 
is closed in Con(P„) under filtered intersections. Since W is locally convex, it's easy to show that each 
conv(P) is the intersection of sets conv(G), where conv(P) C conv(G)° and \G\ < n if |P| < n. Hence 
(Con„(P n ),D) is a domain. 

A compact, saturated subset of [0,oo) op has the form A = [0,r] for some r, and then cap _1 ([0, r]) = 
{conv(P) | cap(conv(P) < r} is closed, and hence compact, since cap is continuous and Con(P„) — and 
hence also Con„(P„) — are compact in the Lawson topology (cf. HUH). But cap _1 ([0,r]) also is an upper 
set, so it is Scott compact. Thus cap is proper, and so it measures Con„(P n ) iff it is strictly monotone. 
But the latter follows from Proposition [Q since entropy is strictly concave. □ 

Remark: As we will see in a moment, the real import of this last result is not so much that cap measures 
Con„(P„), per se, but rather that this implies the mapping cap is strictly monotone. This will tell us that 
under an appropriate (pre-)order, having one channel strictly below another implies that the capacity of 
the lower channel is strictly less than that of the larger one. 

We also note that Martin and Panangaden have obtained results in |[T6l that can be used to derive the 
Proposition [2] and Corollary [2] 



xEU (3e>0)m e (x) CU, 



cap(conv(P)) = sup <^ r x -x) - r x H{x)) \ r x > 0, £ r x 
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5 A domain-like structure of ST(/i) 

We have seen that ST(ra) is a compact affine monoid, and we already commented that every compact 
semigroup has a unique smallest ideal: i.e., a non-empty subset / C S satisfying 75 US/ C I. This 
minimal ideal is denoted ^(S), and it is closed, hence compact. For example, we noted that ^#(DT(«)) 
is a point, which is the equidistribution on n. A reference for much of the material in this section 
is [ 10 ], where basic results about compact affine monoids are laid out. A good reference for results about 
transformation semigroups can be found in Bl . 

Proposition 3. If C G ST(n), then C(Prob(n)) = conv({5,- | 1 G n}) is a convex polytope in [0, 1]". 

Proof. If C G ST(n), then C: Prob(?l) — > Prob(Ti) is an affine mapping, so it preserves the convex struc- 
ture of Prob(«). It follows that C(Prob(«)) = conv({<5,C | i < n}), where 8, G Prob(Tr) is the Dirac 
measure on i G n. Thus, C(Prob(n)) is a convex polytope in Prob(«). □ 

As a result of the Proposition, we can define a relation on ST(«) by 

C = C & C(Prob(n)) = C'(Prob(n)). (3) 

This is clearly a closed equivalence relation, and because channels are affine maps. 

C = C' & C{{8i | i G n}) = C'{{8i \ i G n}) (4) 

Now, C(8i) = 8jC = C(i), the i th row of C, so C = C" iff C and C have the same set of rows vectors. 
Hence, 

C = C' (3k G 5(n)) M K C = C\ (5) 

where M n is the stochastic matrix representing the permutation 71 G S(n). 

We also can obtain an algebraic representation of the relation = using the monoid structure of ST(n). 

Definition 4. IfX is a set, then the full transformation semigroup T(X) on X is the family of all selfmaps 
ofX under composition. A transformation semigroup is a subsemigroup of T (X) for some setX. 

Notation: If S C T(X) is a transformation semigroup, then for s, s' G T(X) and x£X, the element ss' G S 
denotes the function ss'(x) = s'(s(x)) — i.e., we use the "algebraic notation" for function application, 
which agrees with our representation matrix multiplication as composition of functions. 

Here are some simple observations about T(X); the proofs are all straightforward: 

1. T(X) is a monoid whose group of units is the family of bijections of X; if X is finite, this is just 
5(|X|), the group of permutations of |X|-many letters. 

2. Each constant map f x : X — > X by f x (y) = x is s left zero in T(X): if g G T(X), then gf x = f x . It 
follows from general semigroup theory that ^(T(X)) = {f x \ x G X}. 

3. If 5 is a transformation semigroup onZ and SC\Jt(T(X)) ^ 0, then J((S) = Sn.Jt(T (X)). Thus, 
if S contains a constant map, then J%(S) consists of constant maps. This follows from the fact that 
gf is a constant map if either / or g is one, so the constant maps in S form an ideal. 

4. If S is a transformation semigroup, then for each sGS and each i£X, f x G S l s =4> x G s(X)H 
Indeed, if f x G SxU {s}, then there is some s' G S with f x = s's, so x = s(s'(y)) G S(X). 

3 If 5 is a semigroup, then S l denotes the semigroup 5 with an identity element adjoined. 



Mislove 



97 



5. Conversely, if M(S) = {f x \ x G X}, then x G s(X) => f x £ S l s. This follows since x G implies 
x = s(y), for some y GX;if s ^ f x , then / Y = /yS G 55. 

Definition 5. Le? S be a monoid. We define the relations =m and <m on S by: 

s= M s' ^ sJt (S) = s'Jt (S) , and (6) 
s< M s' <^ sJK{S) Cj'^(S). 

It is routine to show that =m and <m are both (topologically) closed relations on any compact monoid 

S. 

Combining properties 4. and 5. above on transformation semigroups with equivalences [3]— [5] yields: 
Proposition 4. Let n > 1 anJ C,C G ST(n). Then 

C(Prob(n))=C'(Prob(n)) ^ \ST(n))C = Jf \ST(n))C' O (37T G 5(n)) M^C = C. (7) 

Since =m=<a/ n(<M) _1 , we can form the relation <m / =m, which is a closed partial order on 
ST(ra)/ = M . 

Theorem 5. Let n>\. Then 

1. The relation =m is a left congruence on ST(n). 

2. (ST(n) I =m, <m / =m) is a compact ordered space, and the quotient map % : ST(n) — > ST{n) / =m 
is an monotone map. 

3. As an ordered space, ST(n)/ =m — Con n (P n ). 

4. Thus cap: (ST(n)/ e m ,<,„ / = M ) — >• [0,oo) op measures (ST(n)/ =m, < m / =m)- 

5. IfC G ST(n), then Cap(C) = cap(conv(C(l), . . . ,C(n))), where C(i) is the i th row ofC. 

Proof. For 1., it is clear from Equation [6] that C =m C implies C"C =m C"C'. Thus, =m is a left- 
congruence. 

Because =m is a closed relation and SJ{n) is compact, the quotient space is compact and the quotient 
map is closed and continuous. The definition of the pre-order <m and the quotient order <m / =m implies 
the quotient map is monotone. 

3. follows from Proposition 01 from which 4. and 5. are clear. □ 

So we see that ST(rc)) has a natural pre-order <m defined by its algebraic structure as a compact 
monoid, and if C <m C then Cap(C) < Cap(C'). Moreover, this pre-order turns into a partial order on 
ST(n)/ =m, and here capacity is the mapping cap. Importantly, cap — and hence Cap on ST(«) — is 
strictly monotone with respect to this partial (pre-) order. Moreover, C =m C iff S{n)C = S(n)C ', so each 
is a permutation of the rows of the other. 

Example 5. Here's an example of what our results tell us about ST(n). Recall that a Z-channel is a 
binary channel of the form 

*=(V 0= (1 - rt (i?HSD- 

showing that each lies on a one-parameter semigroup ([0, 1], ■) —> ST(2) by y(p) = Z p . Now, is a 
homomorphism, so p < p' =?Z p = ZpZpi, while obviously, p ^ p' => S(2)Z p PI S(2)Zpi = 0. It follows 
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that p < p' => n(Z p ) < 7i(Zpi) => Cap(Z p ) = cap(n(Z p )) < cap(n(Z p >)) = Cap{Z p >), so the Z-channels 
{Z p | < p < 1} all have distinct capacities. 
Similarly, the matrices 

form a one parameter semigroup from I2 to ^#(S7"(2)), along which Cap is strictly decreasing. Now 
.<#(S7~(2)) = {r • 0\ + (1 — r)C?2 | < r < 1}, w/zere Oj is the matrix both of whose rows are (8u 821), 
for i = 1,2. For each fixed r £ [0, 1], there is a one-parameter semigroup p 1— > p I2 + (1 — p) ■ (r -Z p + 
(1 — r) -Z'p), and combining the earlier results, we conclude that along this one-parameter semigroup, 
Cap is strictly decreasing. Note as well that conv({/2} U ^#(5T(2))) is equal to the union of these 
one-parameter semigroups. 

We can generalize this verbatim to ST(n): define a Z-channel in ST(n) to be one of the form 
Z p = p ■ I n + (1 — p) ■ Ok, where p £ [0, 1] and Ok is the channel in ^(ST(n)) all of whose rows are 
(0, . . . ,0, 1,0, . . .), where the unique 1 appears in the H h entry. As in the binary case, p < p' =4> Z p = 
ZpZ p i, and p 7^ p 1 => S(n)Z p T\S{n)Z p i = 0. So we can again conclude that Cap is strictly decreasing 

along this one-parameter semigroup. As in the case of n = 2, .^(ST(n)) = \^i< n rkOk \ Y*k r k = 1}. so 
the result extends verbatim to these one-parameter semigroups. 

6 Summary and future work 

We have used an array of tools to analyze the capacity map on the set of classical channels. In Section |3l 
we gave a topological interpretation of capacity of a channel: it is the maximum distance between the 
surface generated by the entropy function applied to the rows of the channel, and the polytope generated 
by the entropy function applied to each individual row. This suggests an method for computing capac- 
ity: the Generalized Mean Value Theorem implies the capacity is achieved at the unique place where 
gradient of the capacity function is — this point is unique because entropy is strongly concave. More- 
over, this produces the input distribution where capacity is achieved — the celebrated Arimoto-Blahut 
Algorithm (DEI commonly used to compute the capacity of a discrete memoryless channel is an itera- 
tive procedure that approximates the capacity, not the input distribution where its value is assumed. An 
algorithm more closely related to our results can be found in |fT9l , where the iteration scheme follows 
the concavity of the capacity function using Newton's Method. 

We also applied our topological result to derive a domain-theoretic interpretation of capacity, again 
using the strong concavity of entropy: the family Con n (P n ) of polytopes with at most n vertices is a 
domain, and capacity measures this domain. The important point is that this implies capacity is strictly 
monotone with respect to the partial order. We found that ST(w) has a natural, algebraically-defined pre- 
order whose associated equivalence =m defines a closed congruence on ST(ra), and modulo which we 
obtain a copy of Conv n (P n ). This implies that capacity is strictly monotone with respect to the pre-order 
on SJ(n). 

In addition, we used the probabilistic measures on compact spaces to define three monads, each of 
which tells us something about classical channels. The first realizes ST(m,n) as the morphisms on the 
Kleisli category of the monad Probs. The second shows that ST(«) is the free compact, affine monoid 
Prot>M[« — >n], while the third shows that DT(?i) is the free compact monoid over S(n), the group of 
permutations on n letters. This last also implies DJ(n) has a zero, which is Haar measure on S(n). 
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The work discussed here concerns classical channels, but we believe that much of it could be gener- 
alized to the quantum setting. We pointed out one connection to existing work on the roll of free affine 
monoids in analyzing quantum qubit channels. In any case, we have shown that the basic results of lfl4l 
generalize from the binary case, where the capacity function of a binary channel was studied in terms 
of the subinterval of [0, 1] it determines. The analog here is the polytope the rows of an n x rc-stochastic 
channel determine. 
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